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I .  INTRODUCTION 


A.  Purpose 

Jhe  purpose  of  this  monograph  is  to  test  some  of  the  CER's  in 
the  Unmanned  Spacecraft  Cost  Model  for  non-constant  error 
variances,  or  heteroskedasticity ,  and  to  take  corrective 
statistical  action  if  and  where  the  problem  is  found. 

B .  Background 

^-The  Unmanned  Spacecraft  Cost  Model  is  a  set  of  regression 
equations  or  Cost  Estimating  Relationships  (CER's)  designed  to 
explain  the  costs  of  spacecraft  subsystems,  such  as  electrical 
power  supplies,  apogee  kick  motors,  and  conmunication  electron¬ 
ics.  Technical  and  performance  characteristics  are  used  to 
explain  costs,  with  the  model  based  on  35  military,  communica¬ 
tions,  weather,  experimental,  and  lunar-probe  spacecraft. 

The  model  presents  equations  for  explaining  both  first-unit 
recurring  costs  and  total  nonrecurring  costs,  using  "normalized" 

and  ’Aannormalized’^  data.  Normalized  data  are  costs  adjusted  for 

\ 

"technology  carryover"  and  "complexity  of  design,"  with  these 
terms  accounting  for  the  impact  cn  cost  of  technological  change 
and  hardware  sophistication.  Unnormalized  data,  on  the  other 
hand,  are  costs  in  deflated  but  otherwise  raw  form. 


Equations  of  the  model,  both  normalized  and  unnormalized,  are 
presently  estimated  independently  of  one  another,  using  ordinary 
least  squares  (OLS)  or  nonlinear  regression.  Based  on  theoretical 
grounds,  however,  several  improvements  to  the  model  may  result 
from: 


•  Testing  equations  for  heteroskedasticity ,  and  taking 
corrective  action,  if  necessary 

•  Estimating  power-function  regression  equations  using 
Goldberger's  unbiased  estimator  [1]  rather  than  OLS 

•  Investigating  alternative  specifications  of  single 
equations 

•  Determining  the  proper  form  of  the  random  error  term  in 
each  equation,  e.g.,  additive  or  multiplicative,  and  then 
using  this  specification  to  drive  the  estimation  technique 

•  Estimating  total  spacecraft  unit  cost  as  a  system  of 
simultaneous  equations 


C .  Scope 


-This  paper,  the  first  of  five  statistical  monographs  on  the 
spacecraft  model,  is  limited  to  the  first  area  of  research,  i.e.. 


testing  equations  fcr  heteroskedasticity .  And  while  no  effort  is 
made  to  gather  cost,  technical  and  performance  data  on  recently 
built  satellites,  the  points  illuminated  here  should  be 
applicable  to  future  model-building  efforts. 
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II.  TESTS  FOR  HETEROSKEDASTICITY 


A.  Explanation 


A  crucial  assumption  in  regression  analysis  is  that  the 
spread  of  observations  on  a  dependent  variable  around  a  popu¬ 
lation  regression  line  is  invariant  with  respect  to  changes  in 
the  value  of  an  explanatory  variable.  Put  another  way,  the 
variance  of  an  equation's  error  term  should  be  constant  from  one 
observation  to  another.  When  it  isn't,  the  errors  are  called 
heteroskedastic,  and  OLS  standard  errors  are  biased.  Figure  1 
illustrates  the  problem. 


Heteroskedasticity  in  the  spacecraft  model,  if  present,  could 
take  either  of  two  forms,  at  least  in  theory.  First,  the 
variance  cf  unit  costs  might  increase  in  proportion  to  the  value 
of  an  explanatory  variable  such  as  subsystem  weight.  If  the  mean 
cost  of  a  heavy  system  is  a  lot  higher  than  the  mean  cost  of  a 
light  one,  for  example,  then  the  magnitude  of  the  delta  between 
the  two  costs  may  imply  different  variances.1 


On  the  other  hand,  however,  the  opposite  case  may  hold. 
Namely,  the  unit  costs  of  lightweight  systems  might  be  more 
volatile  than  those  of  heavyweight  systems  due  to: 


•  Rapid  technological  change  in  the  aerospace  industry  in 
the  early  and  mid  1960's  when  many  of  the  lightweight 
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systems  were  built,  thus  inducing  a  large  variance  in 
costs . 

•  Efforts  in  some  cases  to  pack  a  lot  of  technical 

performance  into  a  lightweight  package,  thus  driving  costs 
above  the  norm. 


Let  the  mean  cost  of  a  lightweight  Apogee  Kick  Motor  (AKM) 
equal  $100,  and  let  the  nean  cost  of  a  heavy  one  equal  $1000. 
Next,  assume  that  three  values  are  observed,  with  identical 
spreads  of  ±10%  about  the  mean  in  each  case: 

110,100,90  for  the  light  AKM 
1100,1000,900  for  the  heavy  AKM 

The  sample  variance  is  100  in  the  first  case  but  10,000  in  the 
second . 


EXAMPLES  OF  HETEROSKEDAST I C I T Y 
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In  each  of  these  graphs  the  dots  represent  ordered  pairs  of 
observations  on  Y  and  X,  the  dependent  and  explanatory  variables 
in  a  simple  linear  relation.  The  lines  represent  population 
regression  equations,  which  are  almost  always  unknown.  The 
vertical  distance  between  a  dot  and  a  line  is  an  observation  on 
the  error  term. 

Heteroskedasticity  occurs  when  the  variance  of  the  regression 
equation's  error  term  is  not  constant.  In  graph  (a)  the  variance 
increases  as  values  of  X  increase.  In  graph  (b) ,  on  the  other 
hand,  an  inverse  relationship  holds. 


FIGURE  I 


B.  Tests 


Park's  test  is  used  to  determine  which  form  of 

heteroskedasticity,  if  either,  is  present  in  the  spacecraft 

model.  The  test,  detailed  in  Appendix  1,  is  performed  on  all 

first-unit  recurring  cost  CER's  which  are  based  on  unnormalized 

data  and  for  which  a  reasonable  number  of  degrees  of  freedom  is 
2 

available.  The  null  hypothesis  in  all  cases  is  that  an 
equation's  error  term  is  homoskedastic .  The  alternative  hypoth¬ 
esis  is  that  the  error  variance  is  related,  either  directly  or 
inversely,  to  the  magnitude  of  the  explanatory  variable. 


To  limit  the  scope  of  this  study  to  manageable  size,  two 
classes  of  CER's  were  not  tested  for  heteroskedasticity 

•  Equations  for  estimating  non-recurring  costs 

•  Equations  based  on  normalized  data. 

Further,  the  test  was  not  performed  on  subsystems  with  a  paucity 
of  observations 

•  Apogee  Kick  Motor  for  1-Axis  Satellites  (sample  size  of  5) 

•  Apogee  Kick  Motor  for  3-Axis  Satellites  (sample  size  of  6) 

•  Dispenser  (sample  size  of  4)  . 

Finally,  inherentlv  nonlinear  equations  of  the  model  were  es¬ 
timated  in  power- function  form,  i.e., 

y«a+XB+e  as  Y  «  aX6ee. 

And  linear  equations  with  Y-intercepts  restricted  to  zero  were 
estimated  in  unrestricted  form,  i.e., 

Y  »  JX  +e  as  Y»a+6X+e  . 
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As  Table  1  shows,  the  null  hypothesis  of  homoskedasticity  is 
rejected  for  three  of  the  sixteen  equations  examined 

(1)  Attitude  Control 

(2)  Attitude  and  Reaction  Control 

(3)  Program  Level 

And  as  Figures  2  through  4  illustrate,  the  spread  of  regression 
residuals  is  inversely  related  to  the  magnitude  of  X  in  the  first 
two  CER's,  and  directly  related  in  the  last. 


TABLE  1 


RESULTS  OF  TIE  PARK  TEST  FOR  KETLRCSKQjAifriCITY 
(Unit-Cost  Equations  Based  on  Unnonnallzod  Data) 

Equation _ _ _ _ _ _  Size _ Stftfci  stir 


Structure  Thermal  Control/  and  Interstage 

Telemetry,  Tracking,  and  Command  (TTiC) 

Communications 

Communications  Antennas 

Communications  Electronics 

Combined  Communications  and  TTC.C 

Attitude  Control 

Attitude  Determination 

Attitude  and  Reaction  Control 

Power  Supply  ( subBynchronouo  altitude) 

Power  Supply  (synchronous  altitude) 

Platform  (without  mission  equipment) 

Program  Level  (as  a  function  of  platform) 

Program  Level  (communications  satellites) 

LOOS  (for  satellites  with  an  AKM) 

LOOS  (for  satellites  without  an  AKM) 


:u 

0.607 

29 

-1.557 

15 

-0.197 

12 

-2.199 

n 

,1  . 0  i  C 

15 

-0.403 

30 

-2 .37  3 

»  »  m  • 

K 

-0,196 

16 

-3,15t 

11 

-0.660 

19 

-0.185 

31 

-1.156 

30 

JJU 

15 

1.63V 

12 

1  .604 

10 

-0.351 

NOTE:  Figures  underlined  represent  cases  where  the  null 

hypothesis  of  homoskedasticity  is  rejsct.sd  at  ttie  51  level 
of  significance  using  the  two-tailod  t-tuot, 
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IRST-UNIT  RECURRING  COST 
EIGHT  OF  THE  SYSTEM 


ATTITUDE  &  REACTION  CONTROL 

<*  OBSERVED  DATUM) 
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FIRST-UNIT  PLATFORM  COST 


III.  THE  GLS  REMEDY 


A.  General 


The  brute  and  blind  mechanical  nature  of  Ordinary 
Least-Squares  (OLS)  gives  excessive  weight  to  observations  on  Y 
that  are  associated  with  large  error  variances.  In  the  Attitude 
and  Reaction  Control  CER  (Figure  3) ,  for  example,  the  position  of 
the  least-squares  line  is  governed  inordinately  by  those  data 
points  that  are  most  spread  out,  i.e.,  by  those  associated  with 
relatively  lightweight  systems.  OLS  estimates  of  regression 
parameters  are  consequently  no  longer  of  minimum  variance, 
although  they  do  remain  unbiased.3 

Generalized  Least  Squares  (GLS)  is  a  statistical  technique 
which  alleviates  the  problem  of  heteroskedasticity  in  a 
regression  equation.  It  adjusts  observations  on  Y  and  X  so  that 
the  variance  of  the  equation's  error  term  is  once  again  constant, 
as  Appendix  2  details. 

B.  GLS  Estimates 


GLS  estimates  of  the  parameters  in  the  three  CER's  are 
compared  to  their  OLS  counterparts  in  Table  2.  Differences  are 
small  for  the  first  CER  but  substantial  for  the  remaining  two. 
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In  the  Attitude  and  Reaction  Control  equation ,  as  Figure  3 
shows,  the  OLS  regression  line  seems  a  little  too  steep,  with  its 
position  inordinately  influenced  by  the  outlier  in  the  southwest 
quadrant  of  the  chart.  And  in  the  Program-Level  Cost  CER,  as 
Figure  4  shows,  the  ordinary  least-squares  line  again  seems  too 

steep,  with  the  northeastern  outlier  appearing  particularly 

.  .  4 

pernicious. 


See  Kmenta  12]  for  a  detailed  explanation. 

4 

Excluding  these  recalcitrant  data  points  from  their  respec¬ 
tive  samples  and  then  re-estimating  using  OLS  gives  values  close 
to  those  obtained  by  GLS  in  the  case  of  the  second  CER,  but  not 
the  first.  In  the  Attitude  and  Reaction  Control  equation,  the 
revised  OLS  line  is  flatter  than  the  GLS  line  by  a  fair  margin. 

In  either  event,  however,  GLS  is  preferred.  It  uses  all 
sample  data,  and  has  optimal  statistical  properties.  The 
outliers,  in  other  words,  are  partly  but  not  fully  to  blame  for 
the  bugaboo  of  heteroskedastic  disturbances.  Indeed,  they're 
symptomatic  of  the  problem. 


TABLE  2 


COMPARISON  OF  OI,S  AND  GLS  ESTIMATES 
(t-statistics  in  parentheses) 


CER/Summary  Statistics 

OLS  Estimates 

lr\a  B 

GLS  Estimates 

Irta  B 

ATTITUDE  CONTROL 

3.370  0.945 

3.265  0.967 

(9.055) (11.090) 

(6.073)  (8.633) 

R-Squared 

0.814 

0.997 

F-Statistic 

122.882 

4847.543 

DW  Statistic 

2.711 

2.390 

ATTITUDE  &  REACTION  CONTROL 

1.559  1.172 

2.630  0.940 

(1.308)  (4.261) 

(3.097)  (5.528) 

R-Squared 

0.565 

0.996 

R-Statistic 

18.159 

1873.922 

DW  Statistic 

1.761 

2.584 

PROGRAM-LEVEL  COST 

-338.815*  0.480 

184.619*  0.414 

(-0.493)  (6.681) 

(0.511)  (6.557) 

R-Squared 

0.615 

0.792 

F-Statistic 

44.635 

53.449 

DW  Statistic 

1.242 

1.208 

* 

These  are  estimates  of  a  rather  than  fcna 


NOTES:  1.  Summary  statistics  and  t-values  for  GLS  estimation 

are  from  the  transformed  GLS  equation,  i.e.,  the 
equation  with  values  of  Y  and  X  adjusted  to  yield  an 
error  term  with  constant  variance  {see  Appendix  2) . 

2.  Further,  the  mechanics  of  GLS  require  that  the 
Y-intercept  of  the  transformed  equation  be  restricted 
to  zero.  Hence,  each  R-Squared  statistic  shown  above 
is  computed  about  a  mean  of  zero. 

3.  Comparison  of  OLS  and  GLS  R-Squared' s  or  F's  is 
invalid  since  they  are  based  on  regressions  using  two 
different  dependent  variables. 


C .  Cost  Comparison 


Cost  estimates  based  on  GLS  are  compared  to  their  OLS 
counterparts  in  Table  3  for  a  quartet  of  sample  obervations  on 
each  explanatory  variable,  i.e.,  for  the  mean  of  X,  for  ±50%  of 
the  mean,  and  for  300%  above  the  mean.  This  latter  percentage  is 
included  to  capture  the  frequent  case  where  a  cost  estimate  is 
needed  for  a  proposed  piece  of  hardware  whose  weight  lies  outside 
the  range  of  the  weights  of  those  spacecraft  subsystems  used  to 
estimate  the  CER. 

GLS  and  OLS  predictions  differ  the  most  for  observations  wide 
of  the  mean,  with  the  percentage  delta  increasing  in  absolute 
value  as  X  becomes  relatively  small  or  relatively  large.  This 
isn't  surprising  since  the  GLS  and  OLS  regression  lines  intersect 
near  the  average  value  of  X  in  all  three  CLR's,  as  Figures  2 
through  4  show. 


TABLE  3 


COMPARISON  OF  GLS  AND  OLS  COST  ESTIMATES  * 
(Costs  are  in  thousands  of  FY79  constant  dollars) 

Predicted  Cost 


CER 

Value  of 

X  GLS 

OLS 

Delta 

%Delta 

ATTITUDE  CONTROL 

0 . 5*Mean 

52.5 

$1206.0 

$1227.8 

$21.8 

1.8% 

Mean 

105.0 

$2357.6 

$2363.7 

$  6.1 

0.3% 

1 . 5*Mean 

157.6 

$3491.5 

$3469.5 

-$22.0 

-0.6% 

4 . 0*Mean 

420.0 

$9008.5 

$8760.8 

-$247.7 

-2.7% 

ATTITUDE  &  REACTION 
CONTROL 

0 . 5*Mean 

47.6 

$  523.8 

$  439.8 

-$84.0 

-16.0% 

Mean 

95.3 

$1005.9 

$  992.1 

-$13.8 

-1.4% 

1 . 5*Mean 

142.9 

$1472.1 

$1595.0 

$122.9 

8.3% 

4 . 0*Mean 

381.2 

$3702.4 

$5037.1 

$1334.7 

36.0% 

PROGRAM-LEVEL  COST 

0 . 5 ’"Mean 

4046.3 

$1859.8 

$1603.4 

-$256.4 

-13.8% 

Mean 

8092.7 

$3535.0 

$3545.7 

$10.7 

0.3% 

1 . 5*Mean 

12139.1 

$5210.2 

$5h  88 . 0 

$277.8 

5.3% 

4 . 0*Mean 

32370.8 

$13586.1 

$15199.2 

$1613.1 

11.9% 

*  All  values  are  in  unlogged  form. 


IV.  CONCLUSION 


A.  Summary 

Sixteen  CER's  of  the  Unmanned  Spacecraft  Cost  Mode],  were 
tested  for  non-constant  error  variances,  or  heteroskedasticity . 
Based  on  Park's  two-tail  t-test,  the  null  hypothesis  of 
homoskedasticity  was  rejected  in  three  cases: 

•  Attitude  Control 

•  Attitude  and  Reaction  Control 

•  Program-Level  Cost 

Generalized  Least  Squares  (GLS)  was  invoked  to  provide  best, 
linear,  unbiased  (BLU)  estimation.  Differences  between  GLS  and 
OLS  estimates  of  regression-equation  parameters  were  profound  in 
the  last  two  CER's. 

B.  Recommendations 

Based  on  the  foregoing  analysis,  this  study  recommends 

1.  Using  GLS  instead  of  OLS  when  heteroskedastic 
disturbances  are  suspected 

2.  Using  observations  on  spacecraft  unit  costs  from  outside 
current  NCD-5  samples  to  compare  the  predictive  accuracy 
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APPENDIX  1 


PARK'S  TEST  FOR  HETEROSKEDASTICITY 

A  simple  linear  equation  of  the  spacecraft  model  is 

(1)  Y^  =  <*  +  8  +  u.  (i  =  1 ,  2 , . . . ,  N)  ,  where 

Y  *  first-unit  hardware  cost 
X  =  hardware  weight 

u  *  a  randomly  distributed  error  term. 

Further,  a  and  3  are  population  parameters  to  be  estimated,  and  N 
is  the  number  of  spacecraft  in  the  sample. 

To  test  for  heteroskedasticity ,  Park  [3]  proposes  using 

e  , 

(2)  Var(u^)  »  6  X^Ye  1  •  where 

{  °  an  unknown  constant 

y  *  a  population  parameter  measuring  degree  of 
heteroskedasticity 

Var(u^)  =  the  variance  of  u^  in  equation  (1) 
e  ^  =  a  well-behaved  random  error  term. 


For  values  of  y  statistically  different  from  zero,  the  error  term 
in  equation  (1)  will  be  heteroskedastic  since  Var  (u^)  will 
change  as  X^  changes. 


To  estimate  y , the  values  u^  from  OLS  estimation  of  equation 

(1)  are  used  as  proxies  for  observations  on  Var(ui)  in  equation 

(2)  .  Taking  logs, 

(3)  ln(ui2)  *  In  6  +Yln(Xi)  +  e 
with  the  significance  of  r  examined  using  the  two-tailed  t-test. 
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APPENDIX  2 


GENERALIZED  LEAST  SQUARES 

Using  results  from  Park's  test  of  Appendix  1, 

A 

(4)  Var(u^)  *  6X.1  ,  or  in  words, 

the  variance  of  the  random  error  term  in  equation  (1)  is  related 
to  the  value  of  the  explanatory  variable,  X^. 

Generalized  Least  Squares  (GLS) 

A 

"*/  2 

•  Dividing  equation  (1)  by  X^ 
simplicity, 

VWi  =  a/wi  +  SXi/wi 

constant  variance 

•  Estimating  this  equation  using  OLS,  with  the  term  1/w^ 
regarded  as  a  second  explanatory  variable,  and  with  the 
Y-intercept  restricted  to  zero. 


is  implemented  by 


,  denoted  wi  for 


+  ui/wi 


Since  the  transformed  error  term  is  of  constant  variance,  i.e., 

E(ui/wi)2  »  Variu^/w^  -  g  , 


the  Gauss-Markov  theorem  now  applies,  and  least-squares  estimates 
are  best,  linear,  unbiased  (BLU) . 
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